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ABSTRACT 

This paper describes a hierarchical distributed con- 
trol (HDC) model for coordinating cooperative 
problem-solving among intelligent systems. The 
model was implemented using SOCIAL, an inno- 
vative object-oriented tool for integrating hetero- 
geneous, distributed software systems. SOCIAL 
embeds applications in “wrapper” objects called 
Agents, which supply predefined capabilities for 
distributed communication, control, data specifi- 
cation and translation. The HDC model is real- 
ized in SOCIAL as a “Manager” Agent that coordi- 
nates interactions among application Agents. The 
HDC-Manager: indexes the capabilities of appli- 
cation Agents; routes request messages to suitable 
server Agents; and stores results in a commonly ac- 
cessible “Bulletin-Board”. This centralized control 
model is illustrated in a fault diagnosis application 
for launch operations support of the Space Shuttle 
fleet at NASA, Kennedy Space Center. 

Keywords: distributed artificial intelligence, sys- 
tems integration, hierarchical distributed control, 
intelligent control, cooperative problem-solving 

INTRODUCTION 

Knowledge-based systems are helping to automate 
important functions in complex problem domains 
such as operations and decision support. Successful 
deployment of intelligent systems requires: (a) in- 
tegration with existing, conventional software pro- 
grams and data stores; and (b) coordinating with 
one another to share complementary knowledge and 
skills, much as people work together cooperatively 


on related tasks. These requirements are difficult 
to satisfy given existing AI technologies. Current 
knowledge-based systems are generally single-user, 
standalone systems based on heterogeneous data 
and knowledge models, development languages and 
tool shells, and processing platforms. Interfaces to 
users, databases, and other conventional software 
systems are typically custom-built and difficult to 
adapt or interconnect. Moreover, intelligent sys- 
tems developed independently of one another tend 
to be ignorant of information resources, problem- 
solving capabilities, and access protocols for peer 
systems. 

SOCIAL is an innovative collection of object- 
oriented tools designed to alleviate these pervasive 
integration problems [Ad 90 b], SOCIAL provides 
a family of “wrapper” objects, called Agents, that 
supply predefined capabilities for distributed com- 
munication, control, data specification and transla- 
tion. Developers embed programs within Agents, 
using high-level, message-based interfaces to spec- 
ify interactions between programs, their embedding 
Agents, and other application Agents. The rel- 
evant Agents transparently manage the transport 
and mapping of specified data across networks of 
disparate processing platforms, languages, devel- 
opment tools, and applications. SOCIAL’s parti- 
tioning of generic and application-specific behav- 
iors shields developers from network protocols and 
other low-level complexities of distributed comput- 
ing. More important, the interfaces between appli- 
cations and SOCIAL Agents are modular and non- 
intrusive, minimizing the number, extent, and cost 
of modifications necessary to re-engineer existing 
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systems for integration. Non-intrusiveness is partic- 
ularly important in mission-critical space and mili- 
tary applications, where alterations for integration 
entail stringent validation and verification testing. 

This paper focuses on a specialized "Manager” 
Agent that realizes a hierarchical distributed con- 
trol (HDC) model on top of SOCLAL’s basic in- 
tegration services. The SOCIAL HDC- Manager 
Agent coordinates the activities of Agents that em- 
bed independent knowledge-based and conventional 
applications relating to a common domain such as 
decision or operations support. Such centralized 
control models are important for managing dis- 
tributed systems that evolve over time through the 
addition of new applications and functions. Cen- 
tralized control is also important for organizing 
complex distributed systems that display not only 
small-scale, one-to-one relationships, but also large- 
scale structure, such as clustering of closely related 
subsets of application elements. 

The HDC-Manager Agent’s coordination func- 
tionality derives from a set of centralized control 
services including: maintaining an index knowledge 
base of the capabilities, addresses, and access mes- 
sage formats for application Agents; formatting and 
routing requests for data or problem-solving pro- 
cessing to suitable server Agents; and posting re- 
quest responses and other globally useful data to a 
commonly accessible “Bulletin-Board”. 

The next section of the paper reviews the over- 
all architecture and functionality of SOCIAL. Sub- 
sequent sections describe the structure and behav- 
ior of the HDC-Manager Agent and illustrate its 
application in the domain of launch operations sup- 
port for the Space Shuttle fleet at NASA, Kennedy 
Space Center. Specifically, a HDC-Manager Agent 
coordinates the activities of standalone expert sys- 
tems that monitor and isolate faults in Shuttle vehi- 
cle and Ground Support systems. The cooperative 
problem-solving enabled by the HDC-Manager pro- 
duces diagnostic conclusions that the applications 
are incapable of reaching individually. 


OVERVIEW OF SOCIAL 

The central problems of integrating heterogeneous 
distributed systems include: 

• communicating across a distributed network of 
heterogeneous computers and operating sys- 
tems in the absence of uniform interprocess 
communication services; 

• specifying and translating information (i.e., 
data, knowledge, commands), across applica- 
tions, programming languages and develop- 
ment shells with incompatible native data rep- 
resentations; 

• coordinating problems-solving across applica- 
tions and development tools that rely on dif- 
ferent internal models for communication and 
control. 

SOCIAL addresses these issues through a uni- 
fied collection of object-oriented tools for dis- 
tributed communication, control, data, (and data, 
type) specification and management. Develop- 
ers access the services provided by each tool 
through high-level Application Programming Inter- 
faces (APIs). The APIs conceal the low-level com- 
plexities of implementing distributed computing 
systems. This means that distributed systems can 
be developed by programmers who lack expertise in 
areas such as interprocess and network communica- 
tion (e.g., Remote Procedure Calls, TCP/IP, ports 
and sockets), variations in data architectures across 
vendor computer platforms, and differences among 
data and control interfaces for standard develop- 
ment tools such as AI shells. Moreover, SOCIAL’s 
high-level development interfaces to distributed ser- 
vices promote modularity, maintainability, extensi- 
bility, and portability. 

The overall SOCIAL architecture is summa- 
rized in Figure .1, SOCIAL’s predefined distributed 
processing functions are bundled together in ob- 
jects called Agents : Agents represent the active 

computational processes within a distributed sys- 
tem. Developers assemble distributed systems by: 
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(a) selecting and instantiating Agents from SO- 
CIAL’S library of predefined Agent classes; and (b) 
embedding individual application elements such as 
programs and databases within Agents. Embed- 
ding consists of using the APIs for accessing SO- 
CIAL’S distributed processing capabilities to estab- 
lish the desired interactions between applications, 
their associated wrapper Agents, and other appli- 
cation Agents. New Agent subclasses can be cre- 
ated through a separate development interface by 
customized or combining services in novel ways to 
satisfy unique application requirements. These new 
Agent types can be incorporated into SOCIAL ’s 
Agent library for subsequent reuse or adaptation. 
The following subsections review the component 
distributed computing technologies used to con- 
struct SOCIAL Agents. 


Library of Agent Classes (Managers, Gateways) I 





Distributed Control ililill Information Access 

| Data Management (Metadata) 

Distributed Communications (MetaCourier) 

Network, Processor, and Operating System Platforms 


Figure .1: Architecture of the SOCIAL Toolset 


Distributed Communication 

SOCIAL’s distributed computing utilities are or- 
ganized in layers, enabling complex functions to 
be built up from simpler ones. The base or sub- 
strate layer of SOCIAL is the MetaCourier tool, 
which provides a high-level, modular distributed 
communications capability for passing information 
between applications based on heterogeneous lan- 
guages, platforms, operating systems, networks, 
and network protocols [Sy 90] . 

The Agent objects that integrate computer- 
based applications or resources are defined at SO- 
CIAL’S MetaCourier level. Developers use the 
MetaCourier API to pass messages between ap- 


plications and their embedding Agents, as well as 
among application Agents. Messages typically con- 
sist of: commands that an Agent passes directly 
into its embedded application, such as database 
queries or calls to execute signal processing pro- 
grams; data arguments to program commands that 
an Agent might call to invoke its embedded appli- 
cation; and symbolic flags or keywords that sig- 
nal the Agent to invoke one or another fully pre- 
programmed interactions with its embedded ap- 
plication. For example, a high-level MetaCourier 
API call issued from a local LISP-based applica- 
tion Agent such as; 

(Tell :agent ’sensor-monitor :sys \Symb 
’(poll measurement-Z)). 

transports the message contents, in this case a com- 
mand to poll measurement-X, from the calling pro- 
gram to the Agent sensor-monitor resident on plat- 
form Symbl. The Tell function initiates a message 
transaction based on an asynchronous communica- 
tion model; the application Agent that issues such 
a message can immediately move on to other pro- 
cessing tasks. The MetaCourier API also provides 
a synchronous “Tell- and- Block” message function 
for “wait-and-see” processing models. 

Agents contain two procedural methods that 
control the processing of messages, called in-filters 
and out-filters. In-filters parse incoming messages 
according to the argument list structure specified 
when the Agent is defined. After parsing the mes- 
sage, an in-filter typically either invokes the Agent’s 
embedded resource or application, or passes the 
message (which it may modify) onto another Agent. 
The MetaCourier semantic model entails a directed 
acyclic computational graph of passed messages. 
When no further passes are required, the in- filter 
of the terminal Agent runs to completion. This 
Agent’s out- filter method is then executed to pre- 
pare a message reply, which is automatically re- 
turned (and possibly modified) through the out- 
filters of intermediate Agents back to the originat- 
ing Agent (i.e., the target Agent for the original 
Tell call). 

A MetaCourier runtime kernel resides on each 
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application host. The kernel provides: (a) a uni- 
form message- passing interface across network plat- 
forms; and (b) a scheduler for managing messages 
and Agent processes (i.e., executing filter meth- 
ods). Each Agent contains two attributes that spec- 
ify associated Host and an Environment objects. 
These MetaCourier objects define particular hard- 
ware and software execution contexts for Agents, 
including the host processor type, operating sys- 
tem, network type and address, language compiler, 
linker, and editor. The MetaCourier kernel uses 
the Host and Environment associations to manage 
the hardware and software platform specific depen- 
dencies that arise in transporting messages between 
heterogeneous, distributed Agents (cf. Figure .2). 


Agent-A Env-A Host-A Host-B Env-B Agent-B 



Figure .2: Operational Model of MetaCourier 


MetaCourier’s high-level message-based API is 
basically identical across different languages such 
as LISP or C. MetaCourier’s communication model 
is also symmetrical or “peer-to-peer”. In contrast, 
client-server models based on remote procedure call 
communication formalisms (RPCs) are asymmet- 
ric: only clients can initiate communication and 
while multiple clients can interact with a partic- 
ular server, a specific client process can only in- 
teract with a particular server. Moreover, until 
recently, RPCs were restricted to inefficient syn- 
chronous (i.e., blocking) communications. 

Data Specification and Translation 

A major difficulty in getting heterogeneous applica- 
tions and information resources to interact with one 
another is the basic incompatibility of their under- 
lying models for representing data, knowledge, and 
commands. These problems are compounded when 
applications are distributed across heterogeneous 
computing platforms with different data architec- 


tures (e.g., opposing byte ordering conventions). 

SOCIAL’s Metadata subsystem addresses these 
complex compatibility problems. MetaData pro- 
vides an object-oriented data model for specifying 
and manipulating data and data types across dis- 
parate applications and platforms. Developers ac- 
cess these tools through a dedicated API. Meta- 
Data handles three basic tasks: (a) encoding and 
decoding basic data types (e.g., character, integer, 
float), transparently across different machine ar- 
chitectures; (b) describing data of arbitrarily com- 
plex abstract types (e.g., database records, frames, 
b-trees), in a uniform object-oriented model; and 
(c) mapping across data models native to particu- 
lar applications and SOCIAL’s generic data model. 
Like MetaCourier, MetaData’s object-oriented API 
is basically uniform across different programming 
languages. Also, MetaData allows new types to be 
defined and manipulated dynamically at runtime. 
Most alternative data, management tools, such as 
XDR, are static and non-object-oriented. 

SOCIAL integrates MetaData with Meta- 
Courier to obtain transparent distributed commu- 
nication of complex data and data types across het- 
erogeneous computer platforms as well as across 
disparate applications: developers embed Meta- 

Data API function calls within the in-filter and out- 
filter methods of interacting Agents, using Meta- 
Courier messages to transport MetaData objects 
across applications residing on distributed hosts. 
MetaData API functions decode and encode mes- 
sage contents, mapping information to and from 
the native representational models of source and 
target applications and MetaData objects. SO- 
CIAL thereby separates distributed communication 
from data specification and translation, and cleanly 
partitions both kinds of generic functionality from 
application-specific processing. 

Distributed Control and Information Access 

SOCIAL’s third layer of object-oriented tools estab- 
lishes custom, high-level API interfaces for Agent 
classes specialized for particular integration or co- 
ordination functionality. Lower-level MetaCourier 
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and MetaData API functions are used to construct 
these Agent API data and control interfaces. The 
high-level APIs largely conceal MetaCourier and 
MetaData interfaces from SOCIAL users. Thus, 
developers of distributed systems typically use the 
predefined specialized Agent classes as the top-level 
building blocks for satisfying their particular archi- 
tectural requirements, accessing the functionality of 
each such Agent type through the dedicated high- 
level API interfaces. 

For example, applications and data stores are 
often constructed using standard development tools 
such as database management systems (DBMSs) 
and AI shells. SOCIAL Database and Knowl- 
edge Gateway Agent classes abstract and isolate the 
application-independent aspects of the control and 
data interfaces to such shells as generic, reusable 
API interfaces. Similarly, Managers are specialized 
Agents for coordinating applications integrated via 
Gateways and other kinds of SOCIAL Agents to 
work together cooperatively. The HDC-Manager, 
described in the following sections, defines one kind 
of centralized model for distributed control of appli- 
cation Agents. If necessary, developers can access 
SOCIAL’s various tool layers to modify dr extend 
existing Agent APIs and define corresponding new 
subclasses of Gateway, Manager, or fully custom 
Agents. 

Gateway and Manager Agent APIs depend 
heavily on MetaCourier and MetaData capabilities 
for controlling and manipulating messages. For in- 
stance, the root Knowledge Gateway Agent class 
defines standard MetaCourier in-filter and out-filter 
methods. The generic in-filter parses and processes 
messages that represent either: (a) incoming re- 
quests initiated by other application Agents; or (b) 
requests initiated by the Agent’s embedded appli- 
cation to pass on to other application Agents. Sim- 
ilarly, the generic out-filter either: (a) assembles re- 
sponses to in-coming requests; or (b) relays replies 
to prior out-going requests back to the embedded 
application. All subclasses of Knowledge Gateway 
Agents inherit these generic shell- and application- 
independent control behaviors. 

SOCIAL uses MetaData to define a uniform 


representational model for data and knowledge 
structures such as relational tuples, facts, fact- 
groups, rules, and frames. Subclasses of the root 
Knowledge Gateway Agent class define API func- 
tions that map transparently between SOCIAL’s 
canonical representation and the knowledge mod- 
els native to particular AI development tools. For 
example, CLIPS Knowledge Gateway Agents trans- 
late between MetaData frames and CLIPS deftem- 
plates, while KEE Gateway Agents map between 
MetaData frames and KEE unit objects. Meta- 
Data objects are also used to pass commands to 
AI shells associated with Gateway Agent subclasses 
(e.g., to reset a fact base, execute a rule-based in- 
ference engine, or load application files). This uni- 
form mapping approach simplifies the problem of 
interconnecting N disparate systems from 0(N*N) 
to 0(N). 

Developers integrate a shell-based application 
by embedding it in an instance of the relevant Gate- 
way Agent subclass, using its shell-specific API 
functions to program the required interactions: the 
API calls specify the particular data or commands 
to be injected into, or extracted from, the em- 
bedded application’s knowledge bases through the 
shell’s, control interfaces. The strategy of modu- 
lar, specialized interface functionality used in SO- 
CIAL’s predefined Gateway Agent classes is di- 
rectly applicable for designing > custom types of 
Agent for integrating standalone applications or 
systems implemented using in-house, proprietary 
development shells. 


Taken together; the SOCIAL tools described thus 
far - MetaCourier, MetaData, and Gateway Agents 
- provide adequate support for integrating conven- 
tional heterogeneous software systems. The inter- 
actions in such systems are relatively few in num- 
ber and can be prescribed in a fixed, determinate 
Order. For example, satellite telemetry data can be 
fed to Agents embedding signal processing and pat- 
tern recognition programs to clean and filter data 
and check for significant events. Database Gate- 
way Agents would be used to dump all data to 
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archival storage, while extracting and writing note- 
worthy events to on-line storage systems. Scientists 
might then study the data through Agents that em- 
bed data analysis and visualization programs, pos- 
sibly through an intermediary User Interface Agent. 
The developer of such a system would embed the 
constituent applications and databases in suitable 
Agents and specify their direct, one-to-one interac- 
tions in terms of the relevant Agent APIs. 

Once autonomous knowlege-based systems are 
incorporated into a distributed system, integration 
tools alone are no longer fully adequate. First, the 
sequencing or composition of behaviors both within 
and across autonomous systems is typically deter- 
mined dynamically, based on the content of incom- 
ing data at run-time. A decentralized approach 
to managing such data-driven behaviors quickly 
becomes intractable. The problem is particularly 
acute in distributed systems that evolve over an ex- 
tended lifecycle, in which application elements are 
enhanced, added, superseded, or reorganized (i.e., 
broken apart or consolidated), over time. 

Second, complex organizational relationships 
emerge among clusters of applications in large-scale 
distributed intelligent systems. For example, pro- 
grams that automate on-line operations of com- 
puter networks and other complex systems are nat- 
urally coupled more closely to one another than 
to decision or maintenance support tools for the 
same target domain. At the same time, inter- 
actions regularly take place between applications 
across functional groupings. For instance, planning 
and scheduling tools (decision support) dictate sys- 
tem configuration activities (op||ations support), 
while behavioral anomalies dete|§ed in on-line sys- 
tems (operations support) trigge| and guide trou- 
bleshooting, diagnosis, and repair activities (main- 
tenance support). Ideally, interlaces for such cross- 
group interactions should be deigned at the clus- 
ter level, to minimize sensitivity to modifications 
within functional groups. 

In short, the. intelligent application elements of 
complex distributed systems mist not only be inte- 
grated, but also coordinated. Systematic coordina- 
tion is necessary both to manage large numbers of 


dynamically evolving interaction pathways and to 
capture complex logical relationships among func- 
tional elements. The HDC-Manager is the first spe- 
cialized class of SOCIAL Agents developed to ad- 
dress these organizational requirements. 

HDC-Manager Functionality 

In essence, the HDC model performs the same role 
in a complex distributed system played by a hu- 
man manager in a large organization. The HDC- 
Manager is associated with one or more applica- 
tion Agents, called its subordinates. The HDC- 
Manager’s specific functions or responsibilities with 
respect to these other Agents include: 

• providing a centralized Index knowledge base 

that specifies: each available information 

source or problem-solving service; the subordi- 
nate Agent that can provide that resource; the 
Agent’s logical address; and a. procedure for 
converting data in a generic resource request 
into a message format that is suitable for that 
server Agent; 

• analyzing tasks requesting information or 
problem-solving services based on the Index 
knowledge base and routing suitable messages 
to the relevant subordinate Agents to accom- 
plish those tasks; 

• mediating all interactions with external (i.e., 
non-subordinate) Agents; 

• providing a centralized Bulletin-Board to store 
problem-solving results and other data of com- 
mon utility for shared access by subordinate 
and external Agents; 

The HDC-Manager realizes the advantages of 
a centralized control architecture in a complex dis- 
tributed system: 

• efficient global coordination of individual 
problem-solving Agents; 

• modularity; 
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• extensibility and maintainability; 

• support for heterogeneity. 

Modularity derives from the HDC-Manager’s 
centralized Bulletin-Board and Index knowledge 
base. Each Index entry identifies services avail- 
able from subordinate Agents symbolically (e.g., 
find-fault-precedents). Each Bulletin-Board post- 
ing identifies the item type, such as service request 
or reply, the posting Agent, and the requesting 
Agent (when appropriate). Subordinate Agents do 
not need to know about the functionality, struc- 
ture, or even the existence of any other application 
Agents; they only require: (a) the generic high- 
level API interface to the HDC-Manager Agent; 
and (b) knowledge of the symbolic names used by 
the HDC-Manager to index the resource types avail- 
able within a specific distributed application. The 
same minimal requirements hold for external ap- 
plication and Manager Agents that need to inter- 
act with an HDC-Manager to obtain information or 
services from its subordinates. 

Extensibility and maintainability follow from 
the HDC-Manager’s modular architecture: new 

subordinate Agents are incorporated simply by: (a) 
updating the Index knowledge base with appropri- 
ate entries to describe its services; and (b) extend- 
ing other subordinate Agents, as needed, to be able 
to request new capabilities and process the results. 
Moreover, existing subordinate Agents can be re- 
configured with minimal disruption. For example, 
suppose that problem-solving functions in one sub- 
ordinate Agent are reallocated to some other appli- 
cation Agent, old or new. Neither the Manager’s 
other subordinates nor any external Agents have to 
be modified; only the HDC-Manager Index knowl- 
edge base needs to be updated to reflect the config- 
uration changes. 

Heterogeneity follows from the modular nature 
of SOCIAL Agents in general. The high-level API 
to the HDC-Manager Agent’s coordination capa- 
bilities is distinct from, but fully compatible with, 
the API interfaces used to embed applications with 
Gateways or other kinds of SOCIAL Agents. Thus, 
the HDC-Manager Agent operates transparently 


with respect to the physical distribution and inter- 
nal architectures (i.e., communication, control, and 
knowledge structures), of subordinate and external 
Agents with which it interacts. Consequently, an 
HDC-Manager can subordinate standalone applica- 
tion Agents, Gateway Agents, and other Manager 
Agents, HDC or otherwise, with equal ease. In par- 
ticular, HDC-Manager Agents can be organized in 
a nested hierarchy to support large complex dis- 
tributed systems, as illustrated in Figure .3. 
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Figure .3: Hierarchy of HDC-Manager Agents 


HDC-Manager Architecture and Operation 

The HDC-Manager Agent was implemented using 
SOCIAL and Common Lisp in a uniform, fully 
object-oriented manner. The Agent is comprised 
of a collection of state variables and utility meth- 
ods (cf. Figure .4). The value for each state vari- 
able consists of a list of MetaData objects, such 
as Agent-Index-Items. Each such object type has 
associated test predicate, instance creation, and 
access functions (e.g., Agent-Index-ItemP, Create- 
Agent- Index-Item, Check- Agent- Index) . These 

low-level functions, written using the MetaData 
API, are invoked through a higher- level HDC- 
Manager API and are transparent to developers. 
Five state variables store the static and dynamic in- 
formation necessary for the HDC-Manager to func- 
tion: 
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• a Task-Agenda for posting, prioritizing, and 
dispatching service request Tasks; 

• a Bulletin-Board for posting Task results and 
other data items to a common memory store 
accessible to all subordinate and external 
Agents; 

• an Index knowledge base that enumerates sub- 
ordinate Agents, their logical locations, server 
capabilities, and functions for assembling data 
into suitable command message formats; 

• a set of Prioritization Conditions for order- 
ing the Agenda queue of pending Tasks to be 
routed by the Manager; 

• an Activity Log for tracing and debugging 
Manager behavior during application develop- 
ment; 

State Variables 
Task Agenda 
Bulletin-Board 
Index of subordinate Agents 
Task Priorities 
Action Log 

Procedural Methods 

MetaCourier Methods: 
in-filter, out-filter 
Utility Methods: 

HDC control model methods 
access methods for Manager state variables 
Application-Specific Methods: 

local Manager task handlers, ext. interfaces 

Figure .4: HDC-Manager Agent Structures 

Currently, the Index knowledge base and the 
Prioritization conditions represent static structures 
that are specified once, when an HDC-Manager 
Agent is first defined to coordinate a specific 
set of applications. Typically, changes are made 
during development, but infrequently thereafter, 
when subordinate application Agents are added, 
removed, or restructured. (Self- adapting systems 
could modify Priority Conditions or even create 
new server Agents dynamically; however, we have 


not yet investigated such possibilities.) The re- 
maining three state variables are more dynamic 
structures: their contents change continually as the 
HDC-Manager regulates an operational distributed 
system. 

The HDC-Manager Agent also incorporates 
four sets of supporting procedural methods: 

• in-filter and out-filter methods for parsing and 
responding to incoming MetaCourier messages 
and processing replies to outgoing messages; 

• auxiliary API utility methods that realize 
the HDC-Manager ’s global hierarchical control 
model; 

• auxiliary API utility methods for creating and 
modifying HDC-Manager data items, posting 
them to and retrieving them from the Agent’s 
state variables; 

• optional methods for handling application 
Tasks within the HDC-Manager itself. Such 
methods may be called for when a Manager 
Agent is configured as a subordinate to other 
Manager Agents. 

The HDC-Manager’s specialized coordination 
functions and information structures are accessed 
through a dedicated API (cf. Figure .5). This 
API is implemented via lower-level Meta Data, and 
MetaCourier capabilities and APIs. The high-level 
API shields SOCIAL developers from the underly- 
ing mechanics of packaging, transporting, and de- 
ciphering messages containing HDC-Manager com- 
mands and data, structures among heterogeneous 
Agents. The API utility methods for accessing 
HDC-Manager state variables and their contents in- 
clude: 

• two methods for searching the Index Knowl- 
edge Base and Bulletin-Board. Both meth- 
ods call a generic symbolic pattern-matching 
function with application-specific search con- 
ditions; 

• a single Create-Item method, which dispatches 
to the various functions that create instances 
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of HDC-Manager MetaData object types: In- 
dex and Bulletin-Board entry items; Priority- 
Conditions, and Tasks (Service Requests); 

• four parallel methods for modifying HDC- 
Manager state variables: Task-Agenda, Index 
Knowledge Base, Bulletin-Board, and Priority- 
Conditions; Each method supports keyword 
options for resetting the variables, posting and 
deleting specific data items; 

The HDOManager API is also used to invoke 
the utility methods that implement the HDC con- 
trol model. These methods are generally triggered 
automatically, from the Manager’s predefined in- 
filter and out-filter, but can be activated by other 
Agents as required. HDC-Manager control meth- 
ods include: 

• a Command-Manager method, which dis- 
patches API command messages, either to in- 
ternal HDC-Manager API methods or to the 
Command-Manager of another HDC-Manager 
Agent. This method also traps illegal com- 
mands; 

• an Initialize method for resetting the HDC- 
Manager state variables and performing any 
application-specific actions; 

• a Prioritize-Agenda method for sorting pend- 
ing Tasks for the Manager to dispatch in ac- 
cordance with the declarative ordering con- 
ditions specified in the Priorities-Conditions 
state variable. Each condition specifies a Task 
Attribute (e.g., service-category, priority), and 
an optional list of Attribute values for ordinal 
sorting; 

• a Task-Dispatcher method, which uses the In- 
dex Knowledge Base to generate and send a 
suitable command message to the relevant sub- 
ordinate Agent requesting the service specified 
in a Task; 

• a top-level Control-Cycle method that invokes 
the Prioritize-Agenda and Task- Dispatcher 
methods that ground the HDC model; 


• a Log-Utility method. A menu-driven 
trace/debug facility can be used to toggle the 
Activity Log, which tracks all messages pro- 
cessed by the Command- Manager, and other 
flags that trace of all runtime modifications to 
Manager state variables. 

Control Methods Data Structure Accessors 

Command-Manager Check-Agent-Index 

Control-Cycle Check-Bulletin-Board 

Initialize Create-ltem 

Log-Utility Modify-Agenda 

Prioritize-Agenda Modify-Agent-Index 

Task-Dispatcher Modify-Bulletin-Board 

Modify-Priority-Conditions 

Figure .5: HDC-Manager Agent Utility Methods 

The Command-Manager enforces regulated hi- 
erarchical control channels. A subordinate Agent 
can communicate with any HDC-Manager Agent 
within a Manager hierarchy; however, any such 
message is processed first by the Agent’s immedi- 
ate superior and then by all intervening Managers. 
This design makes it possible to define and enforce 
alternative organizational reporting policies. The 
default policy is very flexible: messages are tracked, 
but passed along the Manager hierarchy without fil- 
tering. Thus, any subordinate Agent can access the 
Bulletin-Board or request services from other Man- 
agers through its immediate Manager. More or less 
restrictive control architectures may be appropri- 
ate under different conditions. For example, mes- 
sages from subordinates to higher-level Managers 
in time-critical applications could be screened or 
prioritized. Parallel, antagonistic or competitive 
models can also be explored. Finally, the HDC- 
Manager architecture does not preclude a subor- 
dinate reporting to multiple Managers, thus per- 
mitting non-hierarchical models (e.g, matrix struc- 
tures), or elaborate hybrid organizational struc- 
tures. 

The HDC-Manager was designed in a modu- 
lar fashion to facilitate maintenance and exten- 
sion. API accessor utility methods conform to uni- 
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form conventions for naming, argument call struc- 
tures, and parallel behavior. For example, Tasks 
or other data structures can be modified by adding 
attributes (e.g., timestamps for First-In-First-Out 
ordering), or changing attribute names. The devel- 
oper merely changes the relevant Create- X function 
(and possibly one of the API Check methods). New 
HDC-Manager state variables and data objects to 
populate them can be added by implementing ap- 
propriate MetaData type-checking predicates, cre- 
ation and accessor functions, adding a case to the 
Create-Item API method, and extending the ta- 
ble that drives the Command-Manager and Task- 
Dispatcher control methods. 

Similarly, the HDC-Manager API control meth- 
ods share parallel argument call structures and can 
be modified or extended selectively. For example, 
the Initialize method can be customized to per- 
form any actions required to load and initiate all 
subordinate Agents and their embedded applica- 
tions. Moreover, as noted above, alternative or- 
ganizational policies can be implemented to cap- 
ture logical relationships specific to particular dis- 
tributed systems. All such modifications are im- 
plementing by creating HDC-Manager Agent sub- 
classes with custom methods that override the stan- 
dard methods defined by the root Agent class. 
For instance, the default initialization behavior can 
be redefined simply by creating a subclass Agent 
with a new Initialize method that calls a Custom- 
Initialize function. Specialization preserves the 
structure of the original HDC-Manager Agent class 
for use in applications where it is suitable. At the 
same time, inheritance and functional abstraction 
(through method dispatching) promotes adaptabil- 
ity and compact definitions for customized Manager 
Agent subclasses. 

USING THE HDC-MANAGER AGENT 
FOR LAUNCH OPERATIONS SUPPORT 

Over the past decade, NASA Kennedy Space Cen- 
ter (KSC) has developed knowledge-based systems 
to increase automation of operations support tasks 
for the Space Shuttle fleet. Major applications in- 
clude, operations support of the Shuttle Launch 


Processing System, monitoring, control, fault isola- 
tion and management of on-board Shuttle systems 
and Ground Support Equipment. Prototypes have 
been tested successfully (off-line) in support of sev- 
eral Shuttle missions and are currently being ex- 
tended and refined for formal field testing and val- 
idation. Final deployment will require integrating 
these applications, both with one another and with 
existing Shuttle operations support systems. 

KSC recently initiated the EXODUS project 
(Expert Systems for Operations Distributed Users) 
to prepare for this challenging systems integra- 
tion task. As part of this effort, KSC is funding 
Symbiotics, Inc. to develop the SOCIAL toolset 
to help validate, refine, and ultimately imple- 
ment the proposed EXODUS architecture [Ad90a], 
Proof-of-concept prototypes have been constructed 
to demonstrate central EXODUS design concepts: 
distributed data transfer; non-intrusive physical 
distribution of knowledge bases from existing in- 
telligent system to facilitate resource control and 
sharing; and integration of expert systems and 
databases via Gateway Agents for CLIPS, KEE, 
and Oracle development tools. This section de- 
scribes a fourth EXODUS prototype, which used 
a SOCIAL HDC-Manager Agent to coordinate the 
fault isolation activities of two previously stan- 
dalone intelligent systems. 

Background on KSC Launch Operations 

Processing, testing, and launching of Shuttle ve- 
hicles takes place at facilities dispersed across the 
KSC complex. Many activities, such as storing and 
loading fuels and controlling the environments of 
Shuttles on Launch Pads require elaborate elec- 
tromechanical Ground Support Equipment. The 
Launch Processing System (LPS) supports all Shut- 
tle preparation and test activities from arrival at 
KSC through to launch. The LPS provides the 
sole direct real-time interface between Shuttle engi- 
neers, Orbiter vehicles and payloads, and associated 
Ground Support Equipment [He87]. 

The locus of control for the LPS is the Fir- 
ing Room, an integrated network of computers, 
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software, displays, controls, switches, data links 
and hardware interface devices. Firing Room com- 
puters are configured to perform independent LPS 
functions through application software loads. Shut- 
tle engineers use computers configured as Consoles 
to remotely monitor and control specific vehicle and 
Ground Support systems. Each such application 
Console communicates with an associated Front- 
End Processor (FEP) computer that issues com- 
mands, polls sensors, and preprocesses sensor mea- 
surement data to detect significant changes and ex- 
ceptional values. These computers are connected to 
data busses and telemetry channels that interface 
with Shuttles and Ground Support Equipment. 

The LPS Operations team ensures that KSC’s 
four independent Firing Rooms are available con- 
tinuously, in appropriate error-free configurations, 
to support test requirements such as Launch Count- 
down or Orbiter Power-up sequences for the Shut- 
tle fleet. A dedicated Console computer is con- 
figured for these Operations support functions in 
each Firing Room. This computer displays mes- 
sages triggered by the LPS Operating system that 
signal anomalous events such as improper register 
values or expiring process timers. The Operations 
Console supports other conventional programs for 
monitoring and retrieving Firing Room status data 
as well. 

OPERA (for Operations Analyst) consists of an 
integrated collection of expert systems that auto- 
mates many critical LPS Operations support func- 
tions [Ad89bj. OPERA taps into the same data, 
stream of error messages that the LPS sends to 
the Operations Console. OPERA’s primary ex- 
pert systems monitor the data stream for anoma- 
lies and assist LPS Operations users in isolating and 
managing faults by recommending troubleshooting, 
recovery and/or workaround procedures. In ef- 
fect, OPERA retrofits the Operations Console with 
knowledge-based fault isolation capabilities. The 
system is implemented in IvEE on Texas Instru- 
ments Lisp Machines. 

GPC-X is a prototype expert system for iso- 
lating faults in the Shuttle vehicle’s on-board com- 
puter systems, or GPCs. GPC-X monitors (sim- 


ulated) PCM telemetry data to detect and iso- 
late faults in communications between Shuttle CPC 
computers and their associated GPC-FEP comput- 
ers in LPS Firing Rooms. The GPC-X prototype 
is implemented in CLIPS on a Sun Workstation. 

Coordinating Fault Diagnosis with HDC- 
Managers 

One type of memory hardware fault in CPC com- 
puters manifests itself during switchovers of Launch 
Data Buses. These buses connect GPCs to GPC- 
FEPs until just prior to launch, when communica- 
tions are transferred to telemetry links. Unfortu- 
nately, the data stream available to GPC-X does 
not provide any visibility into the occurrence of 
Launch Data Bus switchovers. Thus, GPC-X can 
propose, but not test certain fault hypotheses abou t 
GPC problems. However, switchover events are 
monitored by the LPS Operating System, which 
triggers messages that can be detected by OPERA. 

Typical of the current generation of knowledge- 
based systems, OPERA and GPC-X were devel- 
oped independently of one another, using different 
representation schemes, reasoning and control mod- 
els, software and hardware platforms. More criti- 
cally, neither system possesses internal capabilities 
for modeling or communicating with (remote) peer 
systems. The EXODUS prototype demonstrates 
the use of SOCIAL Agents to rectify these short- 
comings (cf. Figure .6). The distributed applica- 
tion uses two Knowledge Gateway Agents to in- 
tegrate OPERA and GPC-X. An HDC-Manager 
Agent mediates interactions between the OPERA 
and GPC-X Agents, coordinating their independent 
fault isolation activities obtain enhanced diagnostic 
results. 

Specifically, GPC-X, at the appropriate point 
in its rule-based fault isolation activities, issues a 
request via its Gateway Agent to check for Launch 
Data Bus switchovers to the HDC-Manager. The 
request is triggered by adding a simple consequent 
clause of the form (GW-Return LDB-Switchover- 
Check) to the CLIPS rule that proposes the mem- 
ory fault hypothesis. GW-Return is a custom 
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C function defined in the SOCIAL API interface 
for embedding CLIPS. When the rule fires, GW- 
Return interacts with the GPC-X CLIPS Gate- 
way Agent, causing it to formulate a message to 
the HDC-Manager containing a Modify-Agenda re- 
quest to add a MetaData Task object for the LDB- 
Switchover-Check service. The HDC-Manager’s 
Command-Manager dispatches this request, caus- 
ing the Task to be posted to the Task- Agenda. 

Next, the HDC-Manager executes the Control- 
Cycle method, which results in the Task being pro- 
cessed via the Task Dispatcher. First, the Index 
knowledge base is searched for a server Agent for 
LDB-Switchover-Checks. The search identifies the 
OPERA Gateway Agent as a suitable server. Task 
data is then reformulated into a command message 
using the procedure specified by the Index Knowl- 
edge Base. The message is then passed to the 
OPERA Gateway Agent, whose in-filter method 
performs a search of the knowledge base used by 
OPERA to store and interpret LPS Operating Sys- 
tem error messages. The objective is to locate error 
messages (represented as KEE units) indicative of 
LDB Switchover events. Search results are encoded 
within a Manager Bulletin-Board MetaData object, 
which the OPERA Gateway’s out-filter method re- 
turns as a message containing an API command to 
post that object to the Manager’s Bulletin-Board. 
In this prototype, the OPERA Gateway contains 
all of the task processing logic: OPERA itself is 
a passive participant that continues its monitoring 
and fault isolation activities without significant in- 
terruption. 

The GPC-X CLIPS Gateway Agent queries the 
HDC-Manager to check the Bulletin-Board for a 
response to its LDB-Switchover-Check request and 
retrieves the results. The retrieved Bulletin-Board 
item is decoded and the answer is converted into 
a fact that is asserted into the GPC-X fact base. 
Finally, the Gateway activates the CLIPS rule en- 
gine to complete GPC fault diagnosis. Obviously, 
new rules have to be added to GPC-X to exploit 
the newly available hypothesis test data. However, 
all of the basic integration and coordination logic is 
supplied by the embedding GPC-X Gateway Agent 
or the HDC-Manager. 


{ Intelligent Router |j| 


request for Launch 
Data Bus info 


ti 


tasK results 


response (to test LDB 
fault hypothesis) 


task to search for 
Launch Data Bus info 


WSSSMSSSMK 


^KSE-Gatewisy 

Shuttle GPC S 
IComputer Ex. Sys. 


§[operaJ§ 




Figure .6: Hierarchical Coordination in EXODUS 


This EXODUS prototype illustrates non- 
intrusive system-level coordination of distributed 
applications that solve problems at the subsystem 
level of Shuttle Operations: neither OPERA nor 
GPC-X are capable of accomplishing the task of 
confirming or rejecting the GPC memory fault hy- 
pothesis individually. GPC-X generates fault can- 
didates, but lacks sufficient resources to complete 
diagnosis, which requires both generate and test ca- 
pabilities. OPERA automatically detect LPS error 
messages that are relevant to GPC-X’s fault test re- 
quirements. However it lacks contextual knowledge 
about GPC computers - their architecture, behav- 
ior, fault modes and symptoms - to recognize the 
significance of such data. OPERA also lacks the 
capabilities to communicate its interpretations of 
LPS data back to GPC-X to complete diagnosis. 

Together with the Gateway application Agents, 
the HDC-Manager provides the links required to 
combine and utilize the otherwise isolated or frag- 
mented knowledge about Shuttle and Firing Room 
systems and their relationships to one another. The 
resulting coordination architecture is non-intrusive 
in that neither application was modified to include 
direct knowledge of the other system, its interfaces, 
knowledge model, or delivery platform. The HDC- 
Manager introduces an isolating layer of abstrac- 
tion; application Agents need only know how to 
communicate with the HDC-Manager to request 
services and retrieve responses for their embedded 
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applications. 

The proposed design for the complete EXODUS 
system specifies an extended HDC model to inte- 
grate and coordinate all of KSC’s operations sup- 
port applications across areas of functional over- 
lap. A SOCIAL HDC-Manager Agent will support 
architectural changes as EXODUS evolves through 
its lifecycle of initial deployment, maintenance, and 
enhancement. Initial EXODUS applications are 
loosely-coupled and will interact relatively infre- 
quently. Consequently, the HDC-Manager’s cen- 
tralized control strategy does not entail serious per- 
formance penalties. As new applications are added 
and interaction traffic increases, bottlenecks will 
be addressed by reorganizing the EXODUS con- 
trol architecture in terms of hierarchies of HDC- 
Manager Agents, much as growing human organi- 
zations evolve. 

RELATED WORK 

The HDC-Manager generalizes and extends a hi- 
erarchical distributed control (HDC) model orig- 
inally developed for NASA’s Operations Analyst 
(OPERA) system [Ad89a]. The OPERA version 
of the control model was implemented using dis- 
tributed blackboard objects [Ad89bj. OPERA re- 
quires all applications to be co-resident and to be 
implemented using KEE, and only supports a sin- 
gle Manager and subordinate group. The extended 
HDC model relaxes these restrictions using Meta- 
Courier, MetaData, API tools, SOCIAL Agents. 

Most research in Distributed Artificial Intelli- 
gence (DAI) has focused on domains involving a 
single complex problem, such as data fusion. Con- 
trol schemes have emphasized purely local coordi- 
nation methods to achieve cooperation among in- 
telligent systems [Bo88,Hu87]. For example, [Le83] 
employs a homogeneous collection of blackboards 
that interpret data from a (simulated) network of 
spatially distributed sensors to reconstruct vehicle 
positions and movements. Data and hypotheses- 
are shared across adjacent sensor regions. Solu- 
tions emerge consensually, without global manage- 
ment. Decentralized control entails significant per- 


formance overhead from duplicated processing. As 
argued earlier, localized coordination strategies can 
be cumbersome and difficult to maintain for hetero- 
geneous, evolving “multiple problem” DAI systems. 

The MACE DAI tool [Ga86] incorporates man- 
ager agents for centralized routing of messages 
among agents, closely resembling the organizing 
role played by SOCIAL HDC-Managers. How- 
ever, it is not clear that MACE managers can 
be configured in multi-level hierarchies. Moreover, 
MACE managers do not provide shared memory 
Bulletin-Boards. MACE and several other dis- 
tributed system tools such as ABE [Ha88] and 
CRONUS [Sh86] insulate developers from low level 
distributed computing functions and support mes- 
sage sending among processes across computer net- 
works. However, unlike SOCIAL, these tools do not 
implement generic distributed services in uniformly 
object-oriented layered modules that are accessible 
to developers for customizing. In addition, SO- 
CIAL provides more extensive tools for integrating 
across languages and software development shells, 

FUTURE WORK 

Future development will extend SOCIAL’s library 
of Manager Agents. Alternative organizational 
models will explore alternative types of nonhierar- 
chical cooperative coupling. For example, we are in- 
vestigating control behaviors based on group-based 
tasking [Br89], as one approach to providing fault 
tolerance in distributed systems: groups can be 
used to define sets of application Agents that dupli- 
cate support for given services. A group Agent that 
could detect loss of an Agent configured to provide 
a service (e.g., due to dropped network links or host 
platform failures), could activate another member 
Agent to resume the service. Redundancy and re- 
coverability are critical prerequisites for distributed 
systems in mission-critical space and military ap- 
plications. We also plan to implement C versions 
of SOCIAL Manager and Gateway Agents that are 
currently Lisp-based. 
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CONCLUSIONS 

Development tools for distributed intelligent sys- 
tems must be modular and non-intrusive to: (a) 
facilitate integration of existing, standalone sys- 
tems “after the fact;” and (b) minimize lifecycle 
costs for maintaining, enhancing, and re-verifying 
systems. Tools for building distributed systems 
must be able to coordinate as well as integrate 
autonomous application elements. Coordination is 
necessary to manage large numbers of dynamic in- 
teraction pathways and to capture complex organi- 
zational relationships among application elements. 
SOCIAL provides a unified set of object-oriented 
tools that address all of these requirements. The 
HDC-Manager Agent realizes a hierarchical dis- 
tributed control model that adopts a highly cen- 
tralized approach to coordination. Developers use 
a high-level Application Programming Interface to 
access the HDC-Manager’s coordination capabili- 
ties. The API conceals lower-level SOCIAL tools 
for transparent distributed communication, control, 
and data, management. 

SOCIAL has broad applicability for distributed 
intelligent systems that are being developed in 
space-related domains. Knowledge-based and con- 
ventional tools for managing and analyzing data 
need to be coupled to help space scientists explore 
and utilize NASA’s growing stores of astronomi- 
cal and environmental information. Linking short- 
and long-term scheduling and planning tools will 
improve decision support capabilities for complex 
space missions. EXODUS-like architectures can in- 
crease automation of operations support by coor- 
dinating autonomous tools across subsystems and 
functional areas (e.g., configuration, anomaly de- 
tection, diagnosis and correction). Example do- 
mains include payload and Shuttle processing, com- 
puter and communications networks, and vehicle or 
Space Station subsystems (e.g., power generation, 
power distribution, mission payloads, life support). 
Finally, flight and mission control centers can en- 
hance automation and safety in directing launches, 
satellites, and space probes, by combining decision 
and operations support tools into fully unified, co- 
operating systems. 
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